Problem 4 : Term Weighting Schemes in Information Retrieval
نویسنده
چکیده
Information retrieval is the process of evaluating a user's query, or information need, against a set of documents (books, journal articles, web pages, etc.) to determine which of the documents satisses the query. With the advent of the World Wide Web, there is suddenly a need to query enormous sets of documents both eeciently and accurately. In the vector space model of information retrieval, documents are represented by sparse vectors each component of which corresponds to a term, usually a word, in the documents set. In the simplest case, the components of these vectors are the raw frequency counts of each term in each document. More sophisticated term weighting schemes are used to improve information retrieval accuracy. We study a speciic term weighting scheme (log-entropy weighting) to determine its eeectiveness on diierent aspects of retrieval. New approaches to term weighting are also examined. In addition, we describe our workshop experience and some of our technical work.
منابع مشابه
Weighting in Information Retrieval Using Genetic Programming: A Three Stage Process
This paper presents term-weighting schemes that have been evolved using genetic programming in an adhoc Information Retrieval model. We create an entire term-weighting scheme by firstly assuming that term-weighting schemes contain a global part, a term-frequency influence part and a normalisation part. By separating the problem into three distinct phases we reduce the search space and ease the ...
متن کاملThe Effect of Term Importance Degree on Text Retrieval
Various approaches to index term-weighting have been investigated. In fact, term-weighting is an indispensable process for document ranking in most retrieval systems. As well actual information retrieval systems have to deal with explosive growth of documents of various sizes and terms of various frequencies because an appropriate term-weighting scheme has a crucial impact on the overall perfor...
متن کاملEffective Term Weighting for Sentence Retrieval
A well-known challenge of information retrieval is how to infer a user’s underlying information need when the input query consists of only a few keywords. Question Answering (QA) systems face an equally important but opposite challenge: given a verbose question, how can the system infer the relative importance of terms in order to differentiate the core information need from supporting context?...
متن کاملSemi-parametric and Non-parametric Term Weighting for Information Retrieval
Most of the previous research on term weighting for information retrieval has focused on developing specialized parametric term weighting functions. Examples include TF.IDF vector-space formulations, BM25, and language modeling weighting. Each of these term weighting functions takes on a specific parametric form. While these weighting functions have proven to be highly effective, they impose st...
متن کاملA new term-weighting scheme for text classification using the odds of positive and negative class probabilities
Text classification is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term weighting schemes have to assign an appropriate weight to each term to obtain a high text classification performance. Although term weighting is one of the important modules for text classification, and text classificat...
متن کامل